OperaV1, Main, Exploration, bibRecord, 000D85

Learning to Tag from Open Vocabulary Labels

Identifieur interne : 000D85 ( Main/Exploration ); précédent : 000D84; suivant : 000D86

Learning to Tag from Open Vocabulary Labels

Auteurs : Edith Law [États-Unis] ; Burr Settles [États-Unis] ; Tom Mitchell [États-Unis]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2010.

RBID : ISTEX:0A645893CDD5762C7749FF9B0CDC78E44DBDB935

Abstract

Abstract: Most approaches to classifying media content assume a fixed, closed vocabulary of labels. In contrast, we advocate machine learning approaches which take advantage of the millions of free-form tags obtainable via online crowd-sourcing platforms and social tagging websites. The use of such open vocabularies presents learning challenges due to typographical errors, synonymy, and a potentially unbounded set of tag labels. In this work, we present a new approach that organizes these noisy tags into well-behaved semantic classes using topic modeling, and learn to predict tags accurately using a mixture of topic classes. This method can utilize an arbitrary open vocabulary of tags, reduces training time by 94% compared to learning from these tags directly, and achieves comparable performance for classification and superior performance for retrieval. We also demonstrate that on open vocabulary tasks, human evaluations are essential for measuring the true performance of tag classifiers, which traditional evaluation methods will consistently underestimate. We focus on the domain of tagging music clips, and demonstrate our results using data collected with a human computation game called TagATune.

Url:

https://api.istex.fr/document/0A645893CDD5762C7749FF9B0CDC78E44DBDB935/fulltext/pdf

DOI: 10.1007/978-3-642-15883-4_14

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000765
to stream Istex, to step Curation: 000765
to stream Istex, to step Checkpoint: 000157
to stream Main, to step Merge: 000D95
to stream Main, to step Curation: 000D85

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Learning to Tag from Open Vocabulary Labels</title>
<author><name sortKey="Law, Edith" sort="Law, Edith" uniqKey="Law E" first="Edith" last="Law">Edith Law</name>
</author>
<author><name sortKey="Settles, Burr" sort="Settles, Burr" uniqKey="Settles B" first="Burr" last="Settles">Burr Settles</name>
</author>
<author><name sortKey="Mitchell, Tom" sort="Mitchell, Tom" uniqKey="Mitchell T" first="Tom" last="Mitchell">Tom Mitchell</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:0A645893CDD5762C7749FF9B0CDC78E44DBDB935</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/978-3-642-15883-4_14</idno>
<idno type="url">https://api.istex.fr/document/0A645893CDD5762C7749FF9B0CDC78E44DBDB935/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000765</idno>
<idno type="wicri:Area/Istex/Curation">000765</idno>
<idno type="wicri:Area/Istex/Checkpoint">000157</idno>
<idno type="wicri:doubleKey">0302-9743:2010:Law E:learning:to:tag</idno>
<idno type="wicri:Area/Main/Merge">000D95</idno>
<idno type="wicri:Area/Main/Curation">000D85</idno>
<idno type="wicri:Area/Main/Exploration">000D85</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Learning to Tag from Open Vocabulary Labels</title>
<author><name sortKey="Law, Edith" sort="Law, Edith" uniqKey="Law E" first="Edith" last="Law">Edith Law</name>
<affiliation wicri:level="4"><country>États-Unis</country>
<placeName><settlement type="city">Pittsburgh</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Settles, Burr" sort="Settles, Burr" uniqKey="Settles B" first="Burr" last="Settles">Burr Settles</name>
<affiliation wicri:level="4"><country>États-Unis</country>
<placeName><settlement type="city">Pittsburgh</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Mitchell, Tom" sort="Mitchell, Tom" uniqKey="Mitchell T" first="Tom" last="Mitchell">Tom Mitchell</name>
<affiliation wicri:level="4"><country>États-Unis</country>
<placeName><settlement type="city">Pittsburgh</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2010</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">0A645893CDD5762C7749FF9B0CDC78E44DBDB935</idno>
<idno type="DOI">10.1007/978-3-642-15883-4_14</idno>
<idno type="ChapterID">Chap14</idno>
<idno type="ChapterID">14</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Most approaches to classifying media content assume a fixed, closed vocabulary of labels. In contrast, we advocate machine learning approaches which take advantage of the millions of free-form tags obtainable via online crowd-sourcing platforms and social tagging websites. The use of such open vocabularies presents learning challenges due to typographical errors, synonymy, and a potentially unbounded set of tag labels. In this work, we present a new approach that organizes these noisy tags into well-behaved semantic classes using topic modeling, and learn to predict tags accurately using a mixture of topic classes. This method can utilize an arbitrary open vocabulary of tags, reduces training time by 94% compared to learning from these tags directly, and achieves comparable performance for classification and superior performance for retrieval. We also demonstrate that on open vocabulary tasks, human evaluations are essential for measuring the true performance of tag classifiers, which traditional evaluation methods will consistently underestimate. We focus on the domain of tagging music clips, and demonstrate our results using data collected with a human computation game called TagATune.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Pennsylvanie</li>
</region>
<settlement><li>Pittsburgh</li>
</settlement>
<orgName><li>Université Carnegie-Mellon</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="Pennsylvanie"><name sortKey="Law, Edith" sort="Law, Edith" uniqKey="Law E" first="Edith" last="Law">Edith Law</name>
</region>
<name sortKey="Law, Edith" sort="Law, Edith" uniqKey="Law E" first="Edith" last="Law">Edith Law</name>
<name sortKey="Mitchell, Tom" sort="Mitchell, Tom" uniqKey="Mitchell T" first="Tom" last="Mitchell">Tom Mitchell</name>
<name sortKey="Mitchell, Tom" sort="Mitchell, Tom" uniqKey="Mitchell T" first="Tom" last="Mitchell">Tom Mitchell</name>
<name sortKey="Settles, Burr" sort="Settles, Burr" uniqKey="Settles B" first="Burr" last="Settles">Burr Settles</name>
<name sortKey="Settles, Burr" sort="Settles, Burr" uniqKey="Settles B" first="Burr" last="Settles">Burr Settles</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Musique/explor/OperaV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D85 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000D85 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Musique
   |area=    OperaV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:0A645893CDD5762C7749FF9B0CDC78E44DBDB935
   |texte=   Learning to Tag from Open Vocabulary Labels
}}

This area was generated with Dilib version V0.6.21.
Data generation: Thu Apr 14 14:59:05 2016. Site generation: Thu Jan 4 23:09:23 2024

	Serveur d'exploration sur l'opéra
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'opéra

Learning to Tag from Open Vocabulary Labels

Learning to Tag from Open Vocabulary Labels

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri